Skip to content

Advanced Analytics and Real-Time Data Processing in Apache Spark

Accredited by CPD & iAP | FREE PDF Certificate Included | Unlimited Access for 365 Days | Quality Study Materials


Global Edulink

Summary

Price
£12 inc VAT
Study method
Online
Duration
4 hours · Self-paced
Access to content
365 days
Qualification
No formal qualification
CPD
4 CPD hours / points
Additional info
  • Exam(s) / assessment(s) is included in price
  • TOTUM card available but not included in price What's this?

Add to basket or enquire

Overview

Advanced Analytics and Real-Time Data Processing in Apache Spark

Apache Spark is a unified analytics engine that is used in processing and analysing big data. it has started to gain recognition within large organisations for its speed, ease of use, standard interface and real-time data processing features, and could pose a great advantage in getting into the data analysis or data science field of work. If you came here wanting to learn advanced analytics and real time processing in Apache Spark, you are heading the right way with this course, as this course is set to educate you on all the aspects of Apache Spark to start processing and analysing big data.

This professionally narrated course will start off by diving into the architecture and components of Spark streaming to educate you on how it can be used in generating final data batches. You will then move onto explore the use cases of spark streaming application to use it appropriately with the suitable engine, along with an insight into the spark streaming word count problem and spark streaming API. The stressful task of managing events that are not in order while building streaming applications will also be given due attention through this course.

Out highly talented tutors will then guide you on how to create a project using the Spark’s MLlib library to provide you a more hand-on experience with the framework. You will then move onto explore the components and operations of Spark GraphX to create graphs using it for analysis purposes, followed by a chapter on SparkR and its role in distributed data frame implementation. To top it all off, you will also be taught on how to send real-time notifications when a user wants to buy a product from an e-commerce site. By the completion of this course, you will have a great grip on Apache Spark to make use of its advanced analytics and real-time data processing aspects in your career.

Why study at Global Edulink?

Global Edulink offers the most convenient path to gain recognised skills and training that will give you the opportunity to put into practice your knowledge and expertise in an IT or corporate environment. You can study at your own pace at Global Edulink and you will be provided with all the necessary material, tutorials, qualified course instructor, narrated e-learning modules and free resources which include Free CV writing pack, free career support and course demo to make your learning experience more enriching and rewarding.

CPD

4 CPD hours / points
Accredited by The CPD Certification Service

Course media

Description

COURSE CURRICULM

Module 01 : Spark Streaming

  • The Course Overview
  • Introducing Spark Streaming
  • Streaming Context
  • Processing Streaming Data
  • Use Cases
  • Spark Streaming Word Count Hands-On
  • Spark Streaming – Understanding Master URL
  • Integrating Spark Streaming with Apache Kafka
  • mapWithState Operation
  • Transform and Window Operation
  • Join and Output Operations
  • Output Operations -Saving Results to Kafka Sink

Module 02 : Advance Streaming and Use Cases

  • Handling Time in High Velocity Streams
  • Connecting External Systems That Works in At Least Once Guarantee – Deduplicaion
  • Building Streaming Application -Handling Events That Are Not in Order
  • Filtering Bots from Stream of Page View Events

Module 03 : Spark MLlib and ML Pipelines

  • Introducing Machine Learning with Spark
  • Feature Extraction and Transformation
  • Transforming Text into Vector of Numbers – ML Bag-of-Words Technique
  • Logistic Regression
  • Model Evaluation
  • Clustering
  • Gaussian Mixture Model
  • Principal Component Analysis and Distributing the Singular Value Decomposition (SVD)
  • Collaborative Filtering – Building Recommendation Engine

Module 04 : Spark GraphX

  • Introducing Spark GraphX – How to Represent a Graph?
  • Limitations of Graph-Parallel System – Why Spark GraphX?
  • Importing GraphX
  • Create a Graph Using GraphX and Property Graph
  • List of Operators
  • Perform Graph Operations Using GraphX
  • Triplet View

Module 05 : Performing Spark GraphX Operations

  • Perform Subgraph Operations
  • Neighbourhood Aggregations – Collecting Neighbours
  • Counting Degree of Vertex
  • Caching and Uncaching
  • GraphBuilder
  • Vertex and Edge RDD
  • Structural Operators – Connected Components

Module 06 : SparkR

  • Introduction to SparkR and How It’s Used?
  • Setting Up from RStudio
  • Creating Spark DataFrames from Data Sources
  • SparkDataFrames Operations – Grouping, Aggregation
  • Run a Given Function on a Large Dataset Using dapply or dapplyCollect
  • Running Large Dataset by Input Column(s) and Using gapply or gapplyCollect
  • Run Local R Functions Distributed Using spark.lapply
  • Running SQL Queries from SparkR

Module 07 : Analytical Use Cases

  • PageRank Using Spark GraphX
  • Sending Real-Time NotificationWhen User Want to Buy a Product on the E-Commerce Site

Access Duration

The course will be directly delivered to you, and you have 12 months access to the online learning platform from the date you joined the course. The course is self-paced and you can complete it in stages, revisiting the lectures at any time.

Method Of Assessment

The course is assessed online with a final, multiple-choice test, which is marked automatically. You will know instantly whether you have passed the course.

Certification

Those who pass this test will get a certificate in Advanced Analytics and Real-Time Data Processing with Apache Spark

Other benefits

  • High-quality e-learning study materials and mock exams.
  • Tutorials/materials from the industry leading experts.
  • 24/7 Access to the Learning Portal.
  • The benefit of applying for TOTUM extra Discount Card.
  • Recognised Accredited Qualification.
  • Excellent customer service and administrative support

Who is this course for?

This course might interest individuals looking to master advanced analytics and real-time data processing to get into or progress within the data analysis or data science field of work

Requirements

  • Learners must be age 16 or over and should have basic understanding of the English Language, numeracy, literacy and ICT.
  • A basic knowledge of the spark programming, Apache Spark and real-time data processing is required to follow up on this course

Career path

Listed below are few of the jobs this certificate will benefit you in, along with the average UK salary per annum.

  • Data analyst – £25,972 per annum
  • Data scientist – £35,226 per annum
  • Data Manager – £29,986 per annum
  • Data analysis manager – £37,349 per annum
  • Data engineer – £41,223 per annum

Questions and answers

Currently there are no Q&As for this course. Be the first to ask a question.

Reviews

Currently there are no reviews for this course. Be the first to leave a review.

FAQs

Study method describes the format in which the course will be delivered. At Reed Courses, courses are delivered in a number of ways, including online courses, where the course content can be accessed online remotely, and classroom courses, where courses are delivered in person at a classroom venue.

CPD stands for Continuing Professional Development. If you work in certain professions or for certain companies, your employer may require you to complete a number of CPD hours or points, per year. You can find a range of CPD courses on Reed Courses, many of which can be completed online.

A regulated qualification is delivered by a learning institution which is regulated by a government body. In England, the government body which regulates courses is Ofqual. Ofqual regulated qualifications sit on the Regulated Qualifications Framework (RQF), which can help students understand how different qualifications in different fields compare to each other. The framework also helps students to understand what qualifications they need to progress towards a higher learning goal, such as a university degree or equivalent higher education award.

An endorsed course is a skills based course which has been checked over and approved by an independent awarding body. Endorsed courses are not regulated so do not result in a qualification - however, the student can usually purchase a certificate showing the awarding body's logo if they wish. Certain awarding bodies - such as Quality Licence Scheme and TQUK - have developed endorsement schemes as a way to help students select the best skills based courses for them.